Containers


Set Up

All prerequisites, links to material and slides for this course can be found on github.

Or can be downloaded as a zip archive from here.

Course materials

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

  • presentations/slides/ Presentations as an HTML slide show.
  • presentations/singlepage/ Presentations as an HTML single page.
  • presentations/r_code/ R code in presentations.
  • exercises/ Practicals as HTML pages.
  • answers/ Practicals with answers as HTML pages and R code solutions.

What are containers? Why should we use them?


The problem

Something (e.g. bioinformatics analysis or software deployment) works on your computer, and you want to make sure that it will work on another computer.

https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html - CC-BY 4.0

The solution - Docker!

Docker allows for the creation of an isolated environment that can be shipped across different users, machines, or operating systems, and to virtual machines or the cloud.

https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/launching-a-docker-image.html - CC-BY 4.0

Installing Docker

Use this link to install Docker.

  • Click on the Docker desktop icon and make an account with Docker
  • Docker must be open and running to use the command line interface (CLI), which is how we will primarily use Docker
  • See here for Docker CLI commands

Check Docker version to make sure Docker is installed and running

Code (terminal):

docker --version

Installing Docker

If previous command isn’t found check the Docker Desktop advanced settings and make sure CLI tools are available system-wide

Docker infrastructure

  • The Docker client communicates with the Docker daemon based on user commands
    • The daemon is the engine that manages Docker services and objects (e.g. images and containers)

Docker infrastructure

  • The Docker client communicates with the Docker daemon based on user commands
    • The daemon is the engine that manages Docker services and objects (e.g. images and containers)
  • A Docker image is a read-only template or blueprint for running a container.
    • The image contains an isolated file system that is defined in a text file called a Dockerfile

Docker infrastructure

  • The Docker client communicates with the Docker daemon based on user commands
    • The daemon is the engine that manages Docker services and objects (e.g. images and containers)
  • A Docker image is a read-only template or blueprint for running a container.
    • The image contains an isolated file system that is defined in a text file called a Dockerfile
  • Once an image is built, an instance of this image can be launched as a container

Pulling Docker images

There are public repositories of Docker images (e.g. Dockerhub), and typically you start with an existing image and build on top of this.

Pulling Docker images

There are public repositories of Docker images (e.g. Dockerhub), and typically you start with an existing image and build on top of this.

Rocker is a very useful source of images on Dockerhub for R and RStudio. We can pull these images immediately after installing Docker. Here we pull an image containing RStudio and a specific version of R

Code (terminal):

docker pull rocker/rstudio:4.2.3

Docker images

After pulling, the image is now available on our system to run.


Code (terminal):

docker images

Output:

Docker images

After pulling, the image is now available on our system to run.


Code (terminal):

docker images

Output:


Confirm in Docker desktop:

Running docker containers

Once the image is on our system, we can launch a container with the ‘docker run’ command.

Components of the run command: * –rm: this will automatically remove a container when you exit, otherwise can take up room on computer with old, unused containers * -p: before the colon is the port on your computer to be exposed and after the colon is the port inside the container * -e: an environmental variable is set when the conatiner is run, and this will be the password to login * the last argument is the image name followed by the tag (both seen with ‘docker images’)

Code (terminal):

docker run --rm \
          -p 8787:8787 \
           -e PASSWORD=password \
           rocker/rstudio:4.2.3

Running docker containers

While the container is running, we can go to ‘http://localhost:8787’ in a browser and log in with the password from ‘docker run’.

This brings us to a normal RStudio interface

Listing active docker containers

To see all containers running in the local environment, use the ‘docker ps’ command

Code (terminal):

docker ps

Output:

Stopping docker containers

To stop the container currently running, if you are in the terminal tab where it was launched, press Ctrl+C.

Or another tab can be opened and the ‘docker stop’ command can be used with the ID listed from ‘docker ps’

Code (terminal):

docker stop 6ee1e0e97bf8 # this is the ID from 'docker ps'
docker ps

Output:

Adding volumes to containers

The docker container has it’s own file system, and we can mount a local directory onto that file system with the ‘-v’ argument to the ‘docker run’ command

  • Navigate to the ‘r_course’ directory within the downloaded course using the ‘cd’ command in the terminal
  • Use the ‘docker run’ command with the ‘-v’ argument
    • the left side of the colon is the path on your computer to mount
    • the right side is the location within the docker container file system where that data will be accessible
    • /home/rstudio is the working directory of the rstudio session set by the Rocker image

Code (terminal):

# navigate to 'r_course' directory in downloaded material
 cd ~/Downloads/Reproducible_R-master/r_course
 
 # launch docker container
 docker run --rm \
          -v ./data:/home/rstudio \
          -p 8787:8787 \
           -e PASSWORD=password \
           rocker/rstudio:4.2.3

Adding volumes to containers

The RStudio interface now shows the files in the ‘data’ directory

Adding volumes to containers

These files can be read into R, and also files can be written to the local environment

Code (R in docker image):

dataIn <- read.csv("readThisTable.csv")
head(dataIn, 2)
# add gene IDs and write to new file on local computer
dataIn$Gene_ID <- seq(nrow(dataIn))
write.csv(dataIn, "rnaseq_table_withIDs.csv")

Output:

Adding volumes to containers

The R environment files from this RStudio session are written to the working directory in the image, and therefore are copied to the local directory as hidden folders.

This R environment will then be loaded the next time you launch an RStudio container with this volume mounted. If these folders are removed (.config and .local), then a fresh RStudio session will be launched.

Code (terminal):

ls -a data

Output:

Customizing RStudio Docker image

The image we pull from Rocker contains base R and its associated packages. To customize the image, we will need to make a Dockerfile that builds on top of the Rocker image.

A Dockerfile provides the recipe to make the image, and is a text file that can include a series of specialized commands. This includes instructions to install the R packages and its dependencies.

Some examples: * FROM: sets the base image and further instructions build off of this * RUN: executes a command as if in terminal * LABEL: add metadata to the image * COPY: copies files from the the host system to the image file system * CMD: when the container is launched, this is the command that will be run

Dockerfile

Dockerfile

Here we start with the same RStudio base image we used previously, and then add some key R packages.

Dockerfile

The first RUN command installs system dependencies that are common to R packages. This command looks for updates, installs, and cleans up unnecessary files. Adding more R packages could result in missing dependencies, which you can pick up in the log for the build command (next slide). Dependencies for CRAN packages can also be found here.

Dockerfile

Then the R packages are installed using ‘install.packages’ or ‘BiocManager::install’ for Bioconductor packages.

Dockerfile

The port 8787 is exposed and the ‘init’ script that is included with the base RStudio image

Build image from Dockerfile

  • the directory that contains the Dockerfile is the last argument
  • if no filename is given, it will look for a file called ‘Dockerfile’
    • ‘Dockerfile’ is in the data directory of course materials

Code (terminal):

docker build -t rstudio_4.2.3_v1 ./data

Output:

Build image from Dockerfile

Use the docker ‘images’ command to see image

Code (terminal):

docker images

Output:

Build image from Dockerfile

As done previously, use the ‘docker run’ command to launch a container with our customized RStudio session

Code (terminal):

docker run --rm \
          -v ./data:/home/rstudio \
          -p 8787:8787 \
          -e PASSWORD=password \
          rstudio_4.2.3_v1 

Output:

Use Herper for conda packages

Use Herper for conda packages

  • the directory that contains the Dockerfile is the last argument
  • here the Dockerfile has a different name, so we specify the exact path with ‘-f’ argument

Code (terminal):

docker build -t rstudio_4.2.3_salmon -f ./data/Dockerfile_salmon ./data/

Output:

Use Herper for conda packages

Code (terminal):

docker images

Output:

Use Herper for conda packages

Code (R in docker image):

library(Herper)
# the environment name and miniconda path set in the Dockerfile
Herper::local_CondaEnv(new = "pipe_env", 
                       pathToMiniConda = "/home/miniconda")
# test out salmon
system("salmon -h")

Output:

Run containers from Docker Desktop

Run containers from Docker Desktop

Further Resources

Exercises

Exercise on Reproducibility in R can be found here

Contact

Any suggestions, comments, edits or questions (about content or the slides themselves) please reach out to our GitHub and raise an issue.